Introduction

In this project i will perform the bulk analysis in 3 different tissue, brain, liver and lung, to extract differentially expressed genes. I will perform this analysis without excluding rRNA, mRNA, pseudogenes and non canonical chromosomes. The aim of this work is to understand if the methods seen during lesson are robust enough to be reliable in presence of additional sources of variation. I also want to prove that this workflow is able to find meaningful differentially expressed genes between the three samples

Library used in the analysis

Loading the data

The tissue assigned to me are the following: brain, liver and lung. The first step is to load the corresponding dataset:

rse_brain <- readRDS("rse_brain.RDS")
rse_liver <- readRDS("rse_liver.RDS")
rse_lung <- readRDS("rse_lung.RDS")

Then i need to take the transformed values for count because they are stored as overall read coverage over exons:

assays(rse_brain)$counts <- transform_counts(rse_brain)
assays(rse_liver)$counts <- transform_counts(rse_liver)
assays(rse_lung)$counts <- transform_counts(rse_lung)

Quality check

Each replicate need to be checked for some quality parameter, before performing any type of analysis:

  • RIN > 6

  • % of mapped reads > 85%

  • % of rRNA reads → never higher then 10%

I checked each parameters for each replicates:

#Brain
colData(rse_brain)[101,]$'recount_qc.star.uniquely_mapped_reads_%_both' 
## [1] 88.6
colData(rse_brain)[101,]$gtex.smrin 
## [1] 8.8
colData(rse_brain)[101,]$gtex.smrrnart
## [1] 0.0327779
#Liver
colData(rse_liver)[103,]$'recount_qc.star.uniquely_mapped_reads_%_both' 
## [1] 92
colData(rse_liver)[103,]$gtex.smrin
## [1] 6.6
colData(rse_liver)[103,]$gtex.smrrnart
## [1] 0.0158024
#Lung
colData(rse_lung)[104,]$'recount_qc.star.uniquely_mapped_reads_%_both' 
## [1] 92
colData(rse_lung)[104,]$gtex.smrin 
## [1] 6.7
colData(rse_lung)[104,]$gtex.smrrnart
## [1] 0.00239451

If one replicates wasn’t good, i checked the next one. For example:

colData(rse_lung)[100,]$'recount_qc.star.uniquely_mapped_reads_%_both' 
## [1] 88.2
colData(rse_lung)[100,]$gtex.smrin
## [1] 5.5
colData(rse_lung)[100,]$gtex.smrrnart
## [1] 0.00852349
colData(rse_liver)[101,]$'recount_qc.star.uniquely_mapped_reads_%_both' 
## [1] 90.5
colData(rse_liver)[101,]$gtex.smrin
## [1] 6.2
colData(rse_liver)[101,]$gtex.smrrnart
## [1] 0.0260051

Now I create new RSE object with the sample that have passed the quality check;

rse_brain_selected <- rse_brain[,c(98,99,100)]
rse_liver_selected <- rse_liver[,c(98,100,101)]
rse_lung_selected <- rse_lung[,c(98,101,102)]

Count table

Now is necessary to extract the count data from each tissue:

I filter the RSEs in this way:

counts_brain_selected <- assays(rse_brain_selected)$counts
counts_liver_selected <- assays(rse_liver_selected)$counts
counts_lung_selected <- assays(rse_lung_selected)$counts

Now it is possible to create a count table containing each sample, using “DGEList”:

final_count_table <- cbind(counts_brain_selected, counts_liver_selected, counts_lung_selected)
colnames(final_count_table) <- c("Brain98", "Brain99", "Brain100", "Liver98", "Liver100", "Liver101", "Lung98", "Lung101", "Lung102")
rownames(final_count_table) <- rowData(rse_brain_selected)$gene_name
size <- colSums(final_count_table)
y <- DGEList(counts=final_count_table)
group <- as.factor(c("Brain", "Brain", "Brain", "Liver", "Liver", "Liver", "Lung", "Lung", "Lung"))
y$samples$group <- group

I also add other important quality information:

y$samples$rin <- as.factor(c(colData(rse_brain_selected)$gtex.smrin,colData(rse_liver_selected)$gtex.smrin, colData(rse_lung_selected)$gtex.smrin))

y$samples$slice <- as.factor(c(colData(rse_brain_selected)$gtex.smtsd,colData(rse_liver_selected)$gtex.smtsd,colData(rse_lung_selected)$gtex.smtsd))

y$samples$sex <- as.factor(c(colData(rse_brain_selected)$gtex.sex, colData(rse_liver_selected)$gtex.sex, colData(rse_lung_selected)$gtex.sex))

y$samples$age <- as.factor(c(colData(rse_brain_selected)$gtex.age, colData(rse_liver_selected)$gtex.age, colData(rse_lung_selected)$gtex.age))

y$samples$rRNA <- as.factor(c(colData(rse_brain_selected)$gtex.smrrnart,colData(rse_liver_selected)$gtex.smrrnart, colData(rse_lung_selected)$gtex.smrrnart))

y$samples$mapped <- as.factor(c(colData(rse_brain_selected)$"recount_qc.star.uniquely_mapped_reads_%_both",colData(rse_liver_selected)$"recount_qc.star.uniquely_mapped_reads_%_both", colData(rse_lung_selected)$"recount_qc.star.uniquely_mapped_reads_%_both"))

y$samples$chrm <- as.factor(c(colData(rse_brain_selected)$"recount_qc.aligned_reads%.chrm", colData(rse_liver_selected)$"recount_qc.aligned_reads%.chrm", colData(rse_lung_selected)$"recount_qc.aligned_reads%.chrm"))

Now I can check the final count table:

y
## An object of class "DGEList"
## $counts
##              Brain98 Brain99 Brain100 Liver98 Liver100 Liver101 Lung98 Lung101
## SNX18P15           0       0        0       0        0        0      0       0
## SNX18P16           0       0        0       0        0        0      0       0
## ANKRD20A12P        0       0        0       0        0        0      0       0
## ANKRD20A15P        0       0        0       0        0        0      0       0
## LOC105379272       0       0        0       0        0        0      0       0
##              Lung102
## SNX18P15           0
## SNX18P16           0
## ANKRD20A12P        0
## ANKRD20A15P        0
## LOC105379272       0
## 54037 more rows ...
## 
## $samples
##          group lib.size norm.factors rin                           slice sex
## Brain98  Brain 32205953            1 7.8 Brain - Putamen (basal ganglia)   2
## Brain99  Brain 30288918            1 7.1              Brain - Cerebellum   1
## Brain100 Brain 27710552            1 6.9            Brain - Hypothalamus   1
## Liver98  Liver 28097679            1   7                           Liver   2
## Liver100 Liver 27203185            1 6.4                           Liver   1
## Liver101 Liver 28375615            1 6.2                           Liver   2
## Lung98    Lung 32932931            1 6.1                            Lung   1
## Lung101   Lung 33983555            1 7.3                            Lung   2
## Lung102   Lung 30548995            1 6.9                            Lung   1
##            age       rRNA mapped  chrm
## Brain98  60-69  0.0333394   89.6 14.41
## Brain99  60-69  0.0106194     92  7.53
## Brain100 60-69  0.0438324   88.7 21.35
## Liver98  60-69  0.0143595   88.7 16.21
## Liver100 40-49  0.0148645   88.3  22.4
## Liver101 50-59  0.0260051   90.5  21.3
## Lung98   60-69 0.00328285   92.6  2.37
## Lung101  60-69 0.00209038   85.8  2.14
## Lung102  60-69 0.00466061   91.2  4.04

Removing low expressed genes

Genes that have very low counts across all the libraries should be removed prior to downstream analysis. This is justified on both biological and statistical grounds. From biological point of view, a gene must be expressed at some minimal level before it is likely to be translated into a protein or to be considered biologically important. From a statistical point of view, genes with consistently low counts are very unlikely be assessed as significantly DE because low counts do not provide enough statistical evidence for a reliable judgement to be made. Such genes can therefore be removed from the analysis without any loss of information. - From reads to genes to pathways: differential expression analysis of RNA-Seq experiments using Rsubread and the edgeR quasi-likelihood pipeline. Yunshun Chen,1,2 Aaron T. L. Lun,3 and Gordon K. Smytha,1,4

First, I look at the number of low expressed genes. Then keep.exprs function removes all genes with low or equal to 0 expression:

table(rowSums(y$counts==0)==9)
## 
## FALSE  TRUE 
## 39732 14310
keep.exprs <- filterByExpr(y, group=group)
y <- y[keep.exprs, keep.lib.sizes=FALSE]

LogCPM

LogCPM is calculated dividing the number of reads of a gene for the total of the reads in the sample, then multiply for a million and then apply a log2 transformation. This normalize the data about expression based on the dimension of the sample, allowing for a more accurate comparison between samples of different sizes.

logcpm_before <- cpm(y, log=TRUE)

Creating a boxplot of the LogCPM

brain <- c('Brain98', 'Brain99', 'Brain100')
liver <- c('Liver98', 'Liver100', 'Liver101')
lung <- c('Lung98', 'Lung101', 'Lung102')
myColors <- ifelse(colnames(logcpm_before) %in% brain , '#99CCFF'           , ifelse(colnames(logcpm_before) %in% liver, '#0099FF'           ,'#003399' ) )
boxplot(logcpm_before,notch=T,xlab='Replicates',ylab='LogCPM', main='LogCPM before TMM normalization',col=myColors, varwidth=T)

Check values of the median:

for (i in 1:9){
  print(median(logcpm_before[,i]))
}
## [1] 3.272081
## [1] 3.264139
## [1] 3.507099
## [1] 2.337198
## [1] 2.151652
## [1] 2.45868
## [1] 3.370884
## [1] 3.095907
## [1] 2.932823

Normalization via TMM

TMM normalization is a simple and effective method for estimating relative RNA production levels from RNA-seq data. The TMM method estimates scale factors between samples that can be incorporated into currently used statistical methods for DE analysis. - A scaling normalization method for differential expression analysis of RNA-seq data. Mark D Robinson & Alicia Oshlack 

The next step is to apply the TMM via calcNormFactors function in edgeR.

y <- calcNormFactors(y, method = "TMM")
logcpm_after <- cpm(y, log=TRUE)

Now I can visualize and compare the resulting boxplot after the TMM normalization:

#Same as before
boxplot(logcpm_after,notch=T,xlab='Replicates',ylab='LogCPM', main='LogCPM after TMM normalization',col=myColors, varwidth=T)

Check new value for median

for (i in 1:9){
  print(median(logcpm_after[,i]))
}
## [1] 3.007287
## [1] 2.902228
## [1] 3.113191
## [1] 2.835462
## [1] 2.79307
## [1] 2.897711
## [1] 3.058075
## [1] 2.92747
## [1] 2.8449

Multidimensional scaling plot

The first step is to design the linear model. From a logical point of view the intercept is not needed here:

design <- model.matrix(~0+group, data=y$samples)
colnames(design) <- levels(y$samples$group)
design
##          Brain Liver Lung
## Brain98      1     0    0
## Brain99      1     0    0
## Brain100     1     0    0
## Liver98      0     1    0
## Liver100     0     1    0
## Liver101     0     1    0
## Lung98       0     0    1
## Lung101      0     0    1
## Lung102      0     0    1
## attr(,"assign")
## [1] 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"

The aim of a MDS plot is to determine the major source of variation in the data. If data are quite good, I expect that the greatest sources of variation in the data are the different three tissue.

logcpm <- cpm(y, log=TRUE)
plotMDS(logcpm, labels=group, main = 'Multidimensional scaling plot: gene expression profiles - group',)

In the case of brain one sample is little farther from the other two. In this case is better to check other quality information, aiming to understand which may be the source of variability.

plotMDS(logcpm_after, labels=y$samples$rRNA, main = 'Multidimensional scaling plot of distances between gene expression profiles - rRNA% label')

plotMDS(logcpm_after, labels=y$samples$chrm, main = 'Multidimensional scaling plot of distances between gene expression profiles - chrm% label')

plotMDS(logcpm_after, labels=y$samples$slice, main = 'Multidimensional scaling plot of distances between gene expression profiles - slice label')

plotMDS(logcpm_after, labels=y$samples$age, main = 'Multidimensional scaling plot of distances  between gene expression profiles - age label')

plotMDS(logcpm_after, labels=y$samples$sex, main = 'Multidimensional scaling plot of distances 
        between gene expression profiles - sex label')

The tissues cluster very well

Estimating dispersion

Biological CV (BCV) is the coefficient of variation with which the (unknown) true abundance of the gene varies between replicate RNA samples. BCV is therefore likely to be the dominant source of uncertainty for high-count genes, so reliable estimation of BCV is crucial for realistic assessment of differential expression in RNA-Seq experiments. If the abundance of each gene varies between replicate RNA samples in such a way that the genewise standard deviations are proportional to the genewise means, a commonly occurring property of measurements on physical quantities, then it is reasonable to suppose that BCV is approximately constant across genes. - Differential expression analysis of multifactor RNA-Seq experiments with respect to biological variation. Davis J. McCarthy, Yunshun Chen and Gordon K. Smyth

Single estimates for genes are not reliable, better to use the estimate trend to see if they are close to the trend itself. It corrects the single estimates by shrinking (reduction in the effects of sampling variation) them. The next step is computing BCV → correction is computed examining the trend curve, showing the relationship between mean and variance.

y <- estimateDisp(y, design)
plotBCV(y)

The “Common” line is little above 0.5, even if the analysis is considering different sample with different donor for age, sex and slice (brain case), and this sort of things that influence the biological variability.

Linear model

The next step is to fit a quasi-likelihood negative binomial generalized log-linear model to count data. Conduct gene-wise statistical tests for a given coefficient or contrast.

fit <- glmQLFit(y, design)
fit
## An object of class "DGEGLM"
## $coefficients
##               Brain      Liver       Lung
## MIR6859-1 -14.64538 -14.607522 -14.692751
## WASH7P    -10.12656 -10.369724 -10.262623
## SEPT14P18 -14.47832 -13.576960 -14.195668
## CICP27    -14.80306 -14.544675 -14.811911
## LOC729737 -11.37611  -9.939963  -9.555298
## 23854 more rows ...
## 
## $fitted.values
##              Brain98    Brain99   Brain100    Liver98   Liver100    Liver101
## MIR6859-1   16.68522   16.79039   15.69678   8.885085   7.779617    9.357301
## WASH7P    1544.39713 1554.13253 1452.90651 620.702803 543.475944  653.691275
## SEPT14P18   19.74676   19.87124   18.57696  25.043248  21.927407   26.374221
## CICP27      14.22889   14.31858   13.38596   9.466053   8.288302    9.969145
## LOC729737  442.56881  445.35863  416.35088 953.994217 835.299769 1004.696116
##               Lung98    Lung101    Lung102
## MIR6859-1   16.84706   15.72651   13.36360
## WASH7P    1427.64729 1332.69025 1132.45311
## SEPT14P18   27.79918   25.95018   22.05115
## CICP27      14.93647   13.94300   11.84806
## LOC729737 2896.23243 2703.59546 2297.37937
## 23854 more rows ...
## 
## $deviance
##  MIR6859-1     WASH7P  SEPT14P18     CICP27  LOC729737 
##  2.3315482  0.8392765  3.8661459  5.1813846 12.7491286 
## 23854 more elements ...
## 
## $method
## [1] "oneway"
## 
## $counts
##           Brain98 Brain99 Brain100 Liver98 Liver100 Liver101 Lung98 Lung101
## MIR6859-1      12      18       19      16        6        4     17      16
## WASH7P       1245    1943     1371     567      532      724   1387    1497
## SEPT14P18      28       9       21      30       28       14     13      26
## CICP27          6      23       13      13       12        2     15      16
## LOC729737     380     638      295     533     1651      468   2142    4334
##           Lung102
## MIR6859-1      13
## WASH7P       1025
## SEPT14P18      34
## CICP27         10
## LOC729737    1510
## 23854 more rows ...
## 
## $unshrunk.coefficients
##               Brain      Liver       Lung
## MIR6859-1 -14.65453 -14.616329 -14.702344
## WASH7P    -10.12666 -10.369851 -10.262737
## SEPT14P18 -14.48606 -13.580099 -14.201513
## CICP27    -14.81378 -14.552991 -14.822714
## LOC729737 -11.37646  -9.940046  -9.555354
## 23854 more rows ...
## 
## $df.residual
## [1] 6 6 6 6 6
## 23854 more elements ...
## 
## $design
##          Brain Liver Lung
## Brain98      1     0    0
## Brain99      1     0    0
## Brain100     1     0    0
## Liver98      0     1    0
## Liver100     0     1    0
## Liver101     0     1    0
## Lung98       0     0    1
## Lung101      0     0    1
## Lung102      0     0    1
## attr(,"assign")
## [1] 1 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"
## 
## 
## $offset
##          [,1]     [,2]     [,3]    [,4]     [,5]     [,6]     [,7]     [,8]
## [1,] 17.46905 17.47534 17.40799 16.8007 16.66784 16.85249 17.52652 17.45769
##          [,9]
## [1,] 17.29488
## attr(,"class")
## [1] "CompressedMatrix"
## attr(,"Dims")
## [1] 5 9
## attr(,"repeat.row")
## [1] TRUE
## attr(,"repeat.col")
## [1] FALSE
## 23854 more rows ...
## 
## $dispersion
## [1] 0.3926254 0.1708760 0.3996547 0.3919609 0.1639608
## 23854 more elements ...
## 
## $prior.count
## [1] 0.125
## 
## $AveLogCPM
## [1] -1.0183123  5.1493928 -0.2049942 -1.1096182  5.4448313
## 23854 more elements ...
## 
## $df.residual.zeros
## [1] 6 6 6 6 6
## 23854 more elements ...
## 
## $df.prior
## [1] 3.942272
## 
## $var.post
## MIR6859-1    WASH7P SEPT14P18    CICP27 LOC729737 
## 0.5313171 0.2857667 0.6768771 0.8188397 1.4836359 
## 23854 more elements ...
## 
## $var.prior
## MIR6859-1    WASH7P SEPT14P18    CICP27 LOC729737 
## 0.7485407 0.5078020 0.7263704 0.7507707 0.5077233 
## 23854 more elements ...
## 
## $samples
##          group lib.size norm.factors rin                           slice sex
## Brain98  Brain 32103622    1.2027051 7.8 Brain - Putamen (basal ganglia)   2
## Brain99  Brain 30185833    1.2871794 7.1              Brain - Cerebellum   1
## Brain100 Brain 27603132    1.3159321 6.9            Brain - Hypothalamus   1
## Liver98  Liver 28074401    0.7049238   7                           Liver   2
## Liver100 Liver 27188143    0.6373379 6.4                           Liver   1
## Liver101 Liver 28357398    0.7349796 6.2                           Liver   2
## Lung98    Lung 32883508    1.2436342 6.1                            Lung   1
## Lung101   Lung 33948475    1.1244984 7.3                            Lung   2
## Lung102   Lung 30516905    1.0629911 6.9                            Lung   1
##            age       rRNA mapped  chrm
## Brain98  60-69  0.0333394   89.6 14.41
## Brain99  60-69  0.0106194     92  7.53
## Brain100 60-69  0.0438324   88.7 21.35
## Liver98  60-69  0.0143595   88.7 16.21
## Liver100 40-49  0.0148645   88.3  22.4
## Liver101 50-59  0.0260051   90.5  21.3
## Lung98   60-69 0.00328285   92.6  2.37
## Lung101  60-69 0.00209038   85.8  2.14
## Lung102  60-69 0.00466061   91.2  4.04

The next step is to design the contrasts, we choose what we want to compare, by specifying the corresponding column:

contrast is numeric vector or matrix specifying one or more contrasts of the linear model coefficients to be tested equal to zero. The order in the design table is brain - lung

qlfBrainLiver <- glmQLFTest(fit, contrast=c(1,-1,0))
qlfBrainLung <- glmQLFTest(fit, contrast=c(1,0,-1))
qlfLiverLung <- glmQLFTest(fit, contrast=c(0,1,-1))

topTags extracts the top DE tags in a data frame for a given pair of groups, ranked by p-value or absolute log-fold change:

resultsBrainLiver = topTags(qlfBrainLiver, n = 10000000, adjust.method = "BH", sort.by = "PValue", p.value = 1)
resultBrainLung= topTags(qlfBrainLung, n = 10000000, adjust.method = "BH", sort.by = "PValue", p.value = 1)
resultsLiverLung = topTags(qlfLiverLung , n = 10000000, adjust.method = "BH", sort.by = "PValue", p.value = 1)

And also take a look to the numbers of top, down and not signed DE genes in each comparison:

summary(decideTests(qlfBrainLiver, p.value=0.01, lfc=1)) 
##        1*Brain -1*Liver
## Down               3906
## NotSig            16307
## Up                 3646
summary(decideTests(qlfBrainLung, p.value=0.01, lfc=1)) 
##        1*Brain -1*Lung
## Down              3169
## NotSig           17954
## Up                2736
summary(decideTests(qlfLiverLung, p.value=0.01, lfc=1)) 
##        1*Liver -1*Lung
## Down              2411
## NotSig           18935
## Up                2513

Up genes in one condition vs both

Now it’s possible to compare the up regulated gene in each tissue with respect to the other two.

To do this I intersect the two table containing the two comparison performed for one tissue vs the other two. I set some criterion to perform the analysis in the correct way:

I also delete all the genes that are not useful for my analysis:

Brain

brain_1 <- rownames(resultsBrainLiver$table %>% filter(logFC > 1 & logCPM > 0 & FDR < 0.01))
brain_2 <- rownames(resultBrainLung$table %>% filter(logFC > 1 & logCPM > 0 & FDR < 0.01))
brain_total <- intersect(brain_1, brain_2)

table(startsWith(brain_total, "RPL"))
## 
## FALSE 
##  1717
maskBrain <- startsWith(brain_total, "LOC") | startsWith(brain_total,"MIR") | startsWith(brain_total, "LINC") | startsWith(brain_total, "SNORD")
brain_total <- brain_total[!maskBrain]
head(brain_total)
## [1] "NSG1"   "BCAN"   "NYAP1"  "DNAJC6" "MYT1L"  "CELF4"

Liver

liver_1 <- rownames(resultsBrainLiver$table %>% filter(logFC < -1 & logCPM > 0 & FDR < 0.01))
liver_2 <- rownames(resultsLiverLung$table %>% filter(logFC > 1 & logCPM > 0 & FDR < 0.01))
liver_total <- intersect(liver_1, liver_2)

table(startsWith(liver_total, "RPL"))
## 
## FALSE 
##  1636
maskBrain <- startsWith(liver_total, "LOC") | startsWith(liver_total,"MIR") | startsWith(liver_total, "LINC") | startsWith(liver_total, "SNORD")
liver_total <- liver_total[!maskBrain]
head(liver_total)
## [1] "PRAP1"  "C3P1"   "PON1"   "CPN2"   "AKR1C4" "CFHR1"

Lung

lung_1 <- rownames(resultBrainLung$table %>% filter(logFC < -1 & logCPM > 0 & FDR < 0.01))
lung_2 <- rownames(resultsLiverLung$table %>% filter(logFC < -1 & logCPM > 0 & FDR < 0.01))
lung_total <- intersect(lung_1,lung_2)

table(startsWith(lung_total, "RPL"))
## 
## FALSE 
##  1006
maskLung <- startsWith(lung_total, "LOC") | startsWith(lung_total,"MIR") | startsWith(lung_total, "LINC") | startsWith(lung_total, "SNORD")
lung_total <- lung_total[!maskLung]
head(lung_total)
## [1] "TCF21"   "IDO1"    "TBX2"    "SLC11A1" "FENDRR"  "ITGA1"

Example of a gene over-represented in one tissue

I select one gene differentially expressed in one tissue against the other two and check its transcript in UCSC Browser. In this case i choose NSG1.

This is the screen of alternative transcript:

It is possible to notice some event of alternative splicing:

which(rowData(rse_brain)$gene_name == "NSG1") #38639
## [1] 38639

This gene is more expressed in brain with respect to liver and lung where it seems to be not expressed. It is possible to double check this with an appropriate statistical test:

assays(rse_brain)$TPM <- recount::getTPM(rse_brain)
assays(rse_lung)$TPM <- recount::getTPM(rse_lung)
assays(rse_liver)$TPM <- recount::getTPM(rse_liver)
df_b=data.frame(TPM=assays(rse_brain)$TPM[38639,],group="Brain") 
df_lu=data.frame(TPM=assays(rse_lung)$TPM[38639,],group="Lung") 
df_li=data.frame(TPM=assays(rse_liver)$TPM[38639,],group="Liver") 
data_NSG1=rbind(df_b,df_lu,df_li)

#Statistical test 
res_kruskal <- data_NSG1 %>% kruskal_test(TPM ~ group) 
res_kruskal
## # A tibble: 1 Ă— 6
##   .y.       n statistic    df     p method        
## * <chr> <int>     <dbl> <int> <dbl> <chr>         
## 1 TPM    3837     2067.     2     0 Kruskal-Wallis

A p-value of 0 in the Kruskal-Wallis test indicates an extremely significant difference in gene expression distributions among the three tissues being compared. In practical terms, a p-value of 0 implies that there is no chance whatsoever that the observed differences are due to random chance.

I represent this result with a boxplot:

pwc2=data_NSG1 %>% wilcox_test(TPM ~ group, p.adjust.method = "BH") 
pwc2 
## # A tibble: 3 Ă— 9
##   .y.   group1 group2    n1    n2 statistic         p     p.adj p.adj.signif
## * <chr> <chr>  <chr>  <int> <int>     <dbl>     <dbl>     <dbl> <chr>       
## 1 TPM   Brain  Liver   2931   251    735031 2.80e-152 4.20e-152 ****        
## 2 TPM   Brain  Lung    2931   655   1911432 0         0         ****        
## 3 TPM   Liver  Lung     251   655     17892 2.33e- 74 2.33e- 74 ****
pwc = pwc2 %>% add_xy_position(x = "group")
ggboxplot(data_NSG1, x = "group", y = "TPM",outlier.shape = NA,width = 0.5,title="NSG1 expression across tissues", fill = "#0099FF") + 
stat_pvalue_manual(pwc,y.position = c(400,400,400)) + 
labs(subtitle = get_test_label(res_kruskal, detailed = TRUE),caption = get_pwc_label(pwc)) 

lll

which(rowData(rse_liver)$gene_name == "SERPINA6") #39648
## [1]  3648 19946
which(rowData(rse_brain)$gene_name == "ADH6") #9763
## [1] 39648
which(rowData(rse_lung)$gene_name == "ADH6") #9763
## [1] 39648
df_b=data.frame(TPM=assays(rse_brain)$TPM[19946,],group="Brain") 
df_lu=data.frame(TPM=assays(rse_lung)$TPM[19946,],group="Lung") 
df_li=data.frame(TPM=assays(rse_liver)$TPM[19946,],group="Liver") 
data_PON1=rbind(df_b,df_lu,df_li)

#Statistical test 
res_kruskal <- data_PON1 %>% kruskal_test(TPM ~ group) 
res_kruskal
## # A tibble: 1 Ă— 6
##   .y.       n statistic    df     p method        
## * <chr> <int>     <dbl> <int> <dbl> <chr>         
## 1 TPM    3837     1500.     2     0 Kruskal-Wallis
pwc2=data_PON1 %>% wilcox_test(TPM ~ group, p.adjust.method = "BH") 
pwc2 
## # A tibble: 3 Ă— 9
##   .y.   group1 group2    n1    n2 statistic         p     p.adj p.adj.signif
## * <chr> <chr>  <chr>  <int> <int>     <dbl>     <dbl>     <dbl> <chr>       
## 1 TPM   Brain  Liver   2931   251       131 2.12e-242 6.36e-242 ****        
## 2 TPM   Brain  Lung    2931   655    456308 9.25e-147 1.39e-146 ****        
## 3 TPM   Liver  Lung     251   655    164160 6.91e-121 6.91e-121 ****
pwc = pwc2 %>% add_xy_position(x = "group")
ggboxplot(data_PON1, x = "group", y = "TPM",outlier.shape = NA,width = 0.5,title="SERPINA6 expression across tissues", fill = "#0099FF") + 
stat_pvalue_manual(pwc2,y.position = c(400,400,400)) + 
labs(subtitle = get_test_label(res_kruskal, detailed = TRUE),caption = get_pwc_label(pwc))

Ontologies enrichment analysis

I split our DE genes between those “up”-regulated and “down”-regulated in our experiment, according to the log-fold change (positive or negative). Then I compare their overlap with all the GO terms, and evaluate the enrichment of each GO term with the corresponding (corrected) p-value.

The first step is to load the package and set Enrichr as as my target for the enrichment analysis

library('enrichR')
## Welcome to enrichR
## Checking connection ...
## Enrichr ... Connection is Live!
## FlyEnrichr ... Connection is Live!
## WormEnrichr ... Connection is Live!
## YeastEnrichr ... Connection is Live!
## FishEnrichr ... Connection is Live!
## OxEnrichr ... Connection is Live!
setEnrichrSite("Enrichr")
## Connection changed to https://maayanlab.cloud/Enrichr/
## Connection is Live!
websiteLive <- TRUE

Then i can start considering the up regulated genes

Up regulated genes in brain

dbs_ontologies <- c("GO_Biological_Process_2023", "GO_Molecular_Function_2023", "GO_Cellular_Component_2023")
if (websiteLive) {
    enriched_ontologies <- enrichr(brain_total, dbs_ontologies)
}
## Uploading data to Enrichr... Done.
##   Querying GO_Biological_Process_2023... Done.
##   Querying GO_Molecular_Function_2023... Done.
##   Querying GO_Cellular_Component_2023... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of GO Biological Process 2023 database", enriched_ontologies[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of GO Molecular Function 2023 database", enriched_ontologies[[2]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of GO Cellular Component 2023 database", enriched_ontologies[[3]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")

Up regulated genes in liver

if (websiteLive) {
    enriched_ontologies <- enrichr(liver_total, dbs_ontologies)
}
## Uploading data to Enrichr... Done.
##   Querying GO_Biological_Process_2023... Done.
##   Querying GO_Molecular_Function_2023... Done.
##   Querying GO_Cellular_Component_2023... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of GO Biological Process 2023 database", enriched_ontologies[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of GO Molecular Function 2023 database", enriched_ontologies[[2]], showTerms = 5, numChar = 50, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of GO Cellular Component 2023 database", enriched_ontologies[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

Up regulated genes in lung

if (websiteLive) {
    enriched_ontologies <- enrichr(lung_total, dbs_ontologies)
}
## Uploading data to Enrichr... Done.
##   Querying GO_Biological_Process_2023... Done.
##   Querying GO_Molecular_Function_2023... Done.
##   Querying GO_Cellular_Component_2023... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of GO Biological Process 2023 database", enriched_ontologies[[1]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of GO Molecular Function 2023 database", enriched_ontologies[[2]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of GO Cellular Component 2023 database", enriched_ontologies[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

The result for brain are quite clear, while for liver and lung is still not possible to identify the original tissue just looking to the GO term. So i decide to perform additional analysis

Pathway enrichment analysis

Brain

available_databases <- listEnrichrDbs()
print(available_databases)
##     geneCoverage genesPerTerm
## 1          13362          275
## 2          27884         1284
## 3           6002           77
## 4          47172         1370
## 5          47107          509
## 6          21493         3713
## 7           1295           18
## 8           3185           73
## 9           2854           34
## 10         15057          300
## 11          4128           48
## 12         34061          641
## 13          7504          155
## 14         16399          247
## 15         12753           57
## 16         23726          127
## 17         32740           85
## 18         13373          258
## 19         19270          388
## 20         13236           82
## 21         14264           58
## 22          3096           31
## 23         22288         4368
## 24          4533           37
## 25         10231          158
## 26          2741            5
## 27          5655          342
## 28         10406          715
## 29         10493          200
## 30         11251          100
## 31          8695          100
## 32          1759           25
## 33          2178           89
## 34           851           15
## 35         10061          106
## 36         11250          166
## 37         15406          300
## 38         17711          300
## 39         17576          300
## 40         15797          176
## 41         12232          343
## 42         13572          301
## 43          6454          301
## 44          3723           47
## 45          7588           35
## 46          7682           78
## 47          7324          172
## 48          8469          122
## 49         13121          305
## 50         26382         1811
## 51         29065         2123
## 52           280            9
## 53         13877          304
## 54         15852          912
## 55          4320          129
## 56          4271          128
## 57         10496          201
## 58          1678           21
## 59           756           12
## 60          3800           48
## 61          2541           39
## 62          1918           39
## 63          5863           51
## 64          6768           47
## 65         25651          807
## 66         19129         1594
## 67         23939          293
## 68         23561          307
## 69         23877          302
## 70         15886            9
## 71         24350          299
## 72          3102           25
## 73         31132          298
## 74         30832          302
## 75         48230         1429
## 76          5613           36
## 77          9559           73
## 78          9448           63
## 79         16725         1443
## 80         19249         1443
## 81         15090          282
## 82         16129          292
## 83         15309          308
## 84         15103          318
## 85         15022          290
## 86         15676          310
## 87         15854          279
## 88         15015          321
## 89          3788          159
## 90          3357          153
## 91         12668          300
## 92         12638          300
## 93          8973           64
## 94          7010           87
## 95          5966           51
## 96         15562          887
## 97         17850          300
## 98         17660          300
## 99          1348           19
## 100          934           13
## 101         2541           39
## 102         2041           42
## 103         5209          300
## 104        49238         1550
## 105         2243           19
## 106        19586          545
## 107        22440          505
## 108         8184           24
## 109        18329          161
## 110        15755           28
## 111        10271           22
## 112        10427           38
## 113        10601           25
## 114        13822           21
## 115         8002          143
## 116        10089           45
## 117        13247           49
## 118        21809         2316
## 119        23601         2395
## 120        20883          299
## 121        19612          299
## 122        25983          299
## 123        19500          137
## 124        14893          128
## 125        17598         1208
## 126         5902          109
## 127        12486          299
## 128         1073          100
## 129        19513          117
## 130        14433           36
## 131         8655           61
## 132        11459           39
## 133        19741          270
## 134        27360          802
## 135        13072           26
## 136        13464           45
## 137        13787          200
## 138        13929          200
## 139        16964          200
## 140        17258          200
## 141        10352           58
## 142        10471           76
## 143        12419          491
## 144        19378           37
## 145         6201           45
## 146         4558           54
## 147         3264           22
## 148         7802           92
## 149         8551           98
## 150        12444           23
## 151         9000           20
## 152         7744          363
## 153         6204          387
## 154        13420           32
## 155        14148          122
## 156         9813           49
## 157         1397           13
## 158         9116           22
## 159        17464           63
## 160          394           73
## 161        11851          586
## 162         8189          421
## 163        18704          100
## 164         5605           39
## 165         5718           31
## 166        14156           40
## 167        16979          295
## 168         4383          146
## 169        54974          483
## 170        12118          448
## 171        12361          124
## 172         9763          139
## 173         8078          102
## 174         7173           43
## 175         5833          100
## 176        14937           33
## 177        11497           80
## 178        11936           34
## 179         9767           33
## 180        14167           80
## 181        17851          102
## 182        16853          360
## 183         6654          136
## 184         1683           10
## 185        20414          112
## 186        26076          250
## 187        26338          250
## 188        25381          250
## 189        25409          250
## 190        11980          250
## 191        31158          805
## 192        30006          815
## 193        13370          103
## 194        13697          343
## 195         2183           18
## 196        12765           13
## 197         1509          100
## 198        18365         1214
## 199        13525          175
## 200         9525          245
## 201         9440          245
## 202         3857           80
## 203        10489           61
## 204         1198           23
## 205         1882           47
## 206         1552           16
## 207         6713           68
## 208          936           15
## 209         8220          146
## 210         9021          793
## 211         8076           96
## 212        14698           33
## 213        10972           85
## 214        12126           38
## 215        13662           12
## 216        18290           34
## 217        12081           50
## 218        12853          485
##                                            libraryName
## 1                                  Genome_Browser_PWMs
## 2                             TRANSFAC_and_JASPAR_PWMs
## 3                            Transcription_Factor_PPIs
## 4                                            ChEA_2013
## 5                     Drug_Perturbations_from_GEO_2014
## 6                              ENCODE_TF_ChIP-seq_2014
## 7                                        BioCarta_2013
## 8                                        Reactome_2013
## 9                                    WikiPathways_2013
## 10                 Disease_Signatures_from_GEO_up_2014
## 11                                           KEGG_2013
## 12                          TF-LOF_Expression_from_GEO
## 13                                 TargetScan_microRNA
## 14                                    PPI_Hub_Proteins
## 15                          GO_Molecular_Function_2015
## 16                                           GeneSigDB
## 17                                 Chromosome_Location
## 18                                    Human_Gene_Atlas
## 19                                    Mouse_Gene_Atlas
## 20                          GO_Cellular_Component_2015
## 21                          GO_Biological_Process_2015
## 22                            Human_Phenotype_Ontology
## 23                     Epigenomics_Roadmap_HM_ChIP-seq
## 24                                            KEA_2013
## 25                   NURSA_Human_Endogenous_Complexome
## 26                                               CORUM
## 27                             SILAC_Phosphoproteomics
## 28                     MGI_Mammalian_Phenotype_Level_3
## 29                     MGI_Mammalian_Phenotype_Level_4
## 30                                         Old_CMAP_up
## 31                                       Old_CMAP_down
## 32                                        OMIM_Disease
## 33                                       OMIM_Expanded
## 34                                           VirusMINT
## 35                                MSigDB_Computational
## 36                         MSigDB_Oncogenic_Signatures
## 37               Disease_Signatures_from_GEO_down_2014
## 38                     Virus_Perturbations_from_GEO_up
## 39                   Virus_Perturbations_from_GEO_down
## 40                       Cancer_Cell_Line_Encyclopedia
## 41                            NCI-60_Cancer_Cell_Lines
## 42         Tissue_Protein_Expression_from_ProteomicsDB
## 43   Tissue_Protein_Expression_from_Human_Proteome_Map
## 44                                    HMDB_Metabolites
## 45                               Pfam_InterPro_Domains
## 46                          GO_Biological_Process_2013
## 47                          GO_Cellular_Component_2013
## 48                          GO_Molecular_Function_2013
## 49                                Allen_Brain_Atlas_up
## 50                             ENCODE_TF_ChIP-seq_2015
## 51                   ENCODE_Histone_Modifications_2015
## 52                   Phosphatase_Substrates_from_DEPOD
## 53                              Allen_Brain_Atlas_down
## 54                   ENCODE_Histone_Modifications_2013
## 55                           Achilles_fitness_increase
## 56                           Achilles_fitness_decrease
## 57                        MGI_Mammalian_Phenotype_2013
## 58                                       BioCarta_2015
## 59                                       HumanCyc_2015
## 60                                           KEGG_2015
## 61                                     NCI-Nature_2015
## 62                                        Panther_2015
## 63                                   WikiPathways_2015
## 64                                       Reactome_2015
## 65                                              ESCAPE
## 66                                          HomoloGene
## 67                 Disease_Perturbations_from_GEO_down
## 68                   Disease_Perturbations_from_GEO_up
## 69                    Drug_Perturbations_from_GEO_down
## 70                    Genes_Associated_with_NIH_Grants
## 71                      Drug_Perturbations_from_GEO_up
## 72                                            KEA_2015
## 73                      Gene_Perturbations_from_GEO_up
## 74                    Gene_Perturbations_from_GEO_down
## 75                                           ChEA_2015
## 76                                               dbGaP
## 77                            LINCS_L1000_Chem_Pert_up
## 78                          LINCS_L1000_Chem_Pert_down
## 79                         GTEx_Tissue_Expression_Down
## 80                           GTEx_Tissue_Expression_Up
## 81                  Ligand_Perturbations_from_GEO_down
## 82                   Aging_Perturbations_from_GEO_down
## 83                     Aging_Perturbations_from_GEO_up
## 84                    Ligand_Perturbations_from_GEO_up
## 85                    MCF7_Perturbations_from_GEO_down
## 86                      MCF7_Perturbations_from_GEO_up
## 87                 Microbe_Perturbations_from_GEO_down
## 88                   Microbe_Perturbations_from_GEO_up
## 89               LINCS_L1000_Ligand_Perturbations_down
## 90                 LINCS_L1000_Ligand_Perturbations_up
## 91            L1000_Kinase_and_GPCR_Perturbations_down
## 92              L1000_Kinase_and_GPCR_Perturbations_up
## 93                                       Reactome_2016
## 94                                           KEGG_2016
## 95                                   WikiPathways_2016
## 96           ENCODE_and_ChEA_Consensus_TFs_from_ChIP-X
## 97                  Kinase_Perturbations_from_GEO_down
## 98                    Kinase_Perturbations_from_GEO_up
## 99                                       BioCarta_2016
## 100                                      HumanCyc_2016
## 101                                    NCI-Nature_2016
## 102                                       Panther_2016
## 103                                         DrugMatrix
## 104                                          ChEA_2016
## 105                                              huMAP
## 106                                     Jensen_TISSUES
## 107  RNA-Seq_Disease_Gene_and_Drug_Signatures_from_GEO
## 108                       MGI_Mammalian_Phenotype_2017
## 109                                Jensen_COMPARTMENTS
## 110                                    Jensen_DISEASES
## 111                                       BioPlex_2017
## 112                         GO_Cellular_Component_2017
## 113                         GO_Molecular_Function_2017
## 114                         GO_Biological_Process_2017
## 115                        GO_Cellular_Component_2017b
## 116                        GO_Molecular_Function_2017b
## 117                        GO_Biological_Process_2017b
## 118                                     ARCHS4_Tissues
## 119                                  ARCHS4_Cell-lines
## 120                                   ARCHS4_IDG_Coexp
## 121                               ARCHS4_Kinases_Coexp
## 122                                   ARCHS4_TFs_Coexp
## 123                            SysMyo_Muscle_Gene_Sets
## 124                                    miRTarBase_2017
## 125                           TargetScan_microRNA_2017
## 126               Enrichr_Libraries_Most_Popular_Genes
## 127            Enrichr_Submissions_TF-Gene_Coocurrence
## 128         Data_Acquisition_Method_Most_Popular_Genes
## 129                                             DSigDB
## 130                         GO_Biological_Process_2018
## 131                         GO_Cellular_Component_2018
## 132                         GO_Molecular_Function_2018
## 133            TF_Perturbations_Followed_by_Expression
## 134                           Chromosome_Location_hg19
## 135                  NIH_Funded_PIs_2017_Human_GeneRIF
## 136                  NIH_Funded_PIs_2017_Human_AutoRIF
## 137           Rare_Diseases_AutoRIF_ARCHS4_Predictions
## 138           Rare_Diseases_GeneRIF_ARCHS4_Predictions
## 139     NIH_Funded_PIs_2017_AutoRIF_ARCHS4_Predictions
## 140     NIH_Funded_PIs_2017_GeneRIF_ARCHS4_Predictions
## 141                   Rare_Diseases_GeneRIF_Gene_Lists
## 142                   Rare_Diseases_AutoRIF_Gene_Lists
## 143                                    SubCell_BarCode
## 144                                  GWAS_Catalog_2019
## 145                            WikiPathways_2019_Human
## 146                            WikiPathways_2019_Mouse
## 147                  TRRUST_Transcription_Factors_2019
## 148                                    KEGG_2019_Human
## 149                                    KEGG_2019_Mouse
## 150                              InterPro_Domains_2019
## 151                                  Pfam_Domains_2019
## 152      DepMap_WG_CRISPR_Screens_Broad_CellLines_2019
## 153     DepMap_WG_CRISPR_Screens_Sanger_CellLines_2019
## 154               MGI_Mammalian_Phenotype_Level_4_2019
## 155                                 UK_Biobank_GWAS_v1
## 156                                     BioPlanet_2019
## 157                                       ClinVar_2019
## 158                                        PheWeb_2019
## 159                                           DisGeNET
## 160                               HMS_LINCS_KinomeScan
## 161                               CCLE_Proteomics_2020
## 162                                  ProteomicsDB_2020
## 163                        lncHUB_lncRNA_Co-Expression
## 164                      Virus-Host_PPI_P-HIPSTer_2020
## 165                        Elsevier_Pathway_Collection
## 166                     Table_Mining_of_CRISPR_Studies
## 167                         COVID-19_Related_Gene_Sets
## 168                               MSigDB_Hallmark_2020
## 169               Enrichr_Users_Contributed_Lists_2020
## 170                                      TG_GATES_2020
## 171                   Allen_Brain_Atlas_10x_scRNA_2021
## 172               Descartes_Cell_Types_and_Tissue_2021
## 173                                    KEGG_2021_Human
## 174                             WikiPathway_2021_Human
## 175 HuBMAP_ASCT_plus_B_augmented_w_RNAseq_Coexpression
## 176                         GO_Biological_Process_2021
## 177                         GO_Cellular_Component_2021
## 178                         GO_Molecular_Function_2021
## 179               MGI_Mammalian_Phenotype_Level_4_2021
## 180                          CellMarker_Augmented_2021
## 181                            Orphanet_Augmented_2021
## 182                    COVID-19_Related_Gene_Sets_2021
## 183                           PanglaoDB_Augmented_2021
## 184                            Azimuth_Cell_Types_2021
## 185                          PhenGenI_Association_2021
## 186         RNAseq_Automatic_GEO_Signatures_Human_Down
## 187           RNAseq_Automatic_GEO_Signatures_Human_Up
## 188         RNAseq_Automatic_GEO_Signatures_Mouse_Down
## 189           RNAseq_Automatic_GEO_Signatures_Mouse_Up
## 190                         GTEx_Aging_Signatures_2021
## 191                                 HDSigDB_Human_2021
## 192                                 HDSigDB_Mouse_2021
## 193                    HuBMAP_ASCTplusB_augmented_2022
## 194                             FANTOM6_lncRNA_KD_DEGs
## 195                           MAGMA_Drugs_and_Diseases
## 196                                     PFOCR_Pathways
## 197                                     Tabula_Sapiens
## 198                                          ChEA_2022
## 199                    Diabetes_Perturbations_GEO_2022
## 200               LINCS_L1000_Chem_Pert_Consensus_Sigs
## 201               LINCS_L1000_CRISPR_KO_Consensus_Sigs
## 202                                       Tabula_Muris
## 203                                      Reactome_2022
## 204                                         SynGO_2022
## 205                  GlyGen_Glycosylated_Proteins_2022
## 206                              IDG_Drug_Targets_2022
## 207                        KOMP2_Mouse_Phenotypes_2022
## 208            Metabolomics_Workbench_Metabolites_2022
## 209                         Proteomics_Drug_Atlas_2023
## 210                            The_Kinase_Library_2023
## 211                               GTEx_Tissues_V8_2023
## 212                         GO_Biological_Process_2023
## 213                         GO_Cellular_Component_2023
## 214                         GO_Molecular_Function_2023
## 215                                PFOCR_Pathways_2023
## 216                                  GWAS_Catalog_2023
## 217                                      GeDiPNet_2023
## 218                                        MAGNET_2023
##                                                                                link
## 1                          http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/
## 2                                          http://jaspar.genereg.net/html/DOWNLOAD/
## 3                                                                                  
## 4                                    http://amp.pharm.mssm.edu/lib/cheadownload.jsp
## 5                                                  http://www.ncbi.nlm.nih.gov/geo/
## 6                                      http://genome.ucsc.edu/ENCODE/downloads.html
## 7                               https://cgap.nci.nih.gov/Pathways/BioCarta_Pathways
## 8                                       http://www.reactome.org/download/index.html
## 9                           http://www.wikipathways.org/index.php/Download_Pathways
## 10                                                 http://www.ncbi.nlm.nih.gov/geo/
## 11                                                http://www.kegg.jp/kegg/download/
## 12                                                 http://www.ncbi.nlm.nih.gov/geo/
## 13        http://www.targetscan.org/cgi-bin/targetscan/data_download.cgi?db=vert_61
## 14                                                    http://amp.pharm.mssm.edu/X2K
## 15                       http://www.geneontology.org/GO.downloads.annotations.shtml
## 16                                        https://pubmed.ncbi.nlm.nih.gov/22110038/
## 17                         http://software.broadinstitute.org/gsea/msigdb/index.jsp
## 18                                                     http://biogps.org/downloads/
## 19                                                     http://biogps.org/downloads/
## 20                       http://www.geneontology.org/GO.downloads.annotations.shtml
## 21                       http://www.geneontology.org/GO.downloads.annotations.shtml
## 22                                         http://www.human-phenotype-ontology.org/
## 23                                               http://www.roadmapepigenomics.org/
## 24                                 http://amp.pharm.mssm.edu/lib/keacommandline.jsp
## 25                                            https://www.nursa.org/nursa/index.jsf
## 26                              http://mips.helmholtz-muenchen.de/genre/proj/corum/
## 27                                 http://amp.pharm.mssm.edu/lib/keacommandline.jsp
## 28                                                  http://www.informatics.jax.org/
## 29                                                  http://www.informatics.jax.org/
## 30                                              http://www.broadinstitute.org/cmap/
## 31                                              http://www.broadinstitute.org/cmap/
## 32                                                    http://www.omim.org/downloads
## 33                                                    http://www.omim.org/downloads
## 34                                        http://mint.bio.uniroma2.it/download.html
## 35                        http://www.broadinstitute.org/gsea/msigdb/collections.jsp
## 36                        http://www.broadinstitute.org/gsea/msigdb/collections.jsp
## 37                                                 http://www.ncbi.nlm.nih.gov/geo/
## 38                                                 http://www.ncbi.nlm.nih.gov/geo/
## 39                                                 http://www.ncbi.nlm.nih.gov/geo/
## 40                                   https://portals.broadinstitute.org/ccle/home\n
## 41                                                     http://biogps.org/downloads/
## 42                                                    https://www.proteomicsdb.org/
## 43                                        http://www.humanproteomemap.org/index.php
## 44                                                     http://www.hmdb.ca/downloads
## 45                                      ftp://ftp.ebi.ac.uk/pub/databases/interpro/
## 46                       http://www.geneontology.org/GO.downloads.annotations.shtml
## 47                       http://www.geneontology.org/GO.downloads.annotations.shtml
## 48                       http://www.geneontology.org/GO.downloads.annotations.shtml
## 49                                                        http://www.brain-map.org/
## 50                                     http://genome.ucsc.edu/ENCODE/downloads.html
## 51                                     http://genome.ucsc.edu/ENCODE/downloads.html
## 52                                                  http://www.koehn.embl.de/depod/
## 53                                                        http://www.brain-map.org/
## 54                                     http://genome.ucsc.edu/ENCODE/downloads.html
## 55                                           http://www.broadinstitute.org/achilles
## 56                                           http://www.broadinstitute.org/achilles
## 57                                                  http://www.informatics.jax.org/
## 58                              https://cgap.nci.nih.gov/Pathways/BioCarta_Pathways
## 59                                                             http://humancyc.org/
## 60                                                http://www.kegg.jp/kegg/download/
## 61                                                          http://pid.nci.nih.gov/
## 62                                                        http://www.pantherdb.org/
## 63                          http://www.wikipathways.org/index.php/Download_Pathways
## 64                                      http://www.reactome.org/download/index.html
## 65                                                 http://www.maayanlab.net/ESCAPE/
## 66                                           http://www.ncbi.nlm.nih.gov/homologene
## 67                                                 http://www.ncbi.nlm.nih.gov/geo/
## 68                                                 http://www.ncbi.nlm.nih.gov/geo/
## 69                                                 http://www.ncbi.nlm.nih.gov/geo/
## 70                                          https://grants.nih.gov/grants/oer.htm\n
## 71                                                 http://www.ncbi.nlm.nih.gov/geo/
## 72                                                http://amp.pharm.mssm.edu/Enrichr
## 73                                                 http://www.ncbi.nlm.nih.gov/geo/
## 74                                                 http://www.ncbi.nlm.nih.gov/geo/
## 75                                                http://amp.pharm.mssm.edu/Enrichr
## 76                                                  http://www.ncbi.nlm.nih.gov/gap
## 77                                                                 https://clue.io/
## 78                                                                 https://clue.io/
## 79                                                       http://www.gtexportal.org/
## 80                                                       http://www.gtexportal.org/
## 81                                                 http://www.ncbi.nlm.nih.gov/geo/
## 82                                                 http://www.ncbi.nlm.nih.gov/geo/
## 83                                                 http://www.ncbi.nlm.nih.gov/geo/
## 84                                                 http://www.ncbi.nlm.nih.gov/geo/
## 85                                                 http://www.ncbi.nlm.nih.gov/geo/
## 86                                                 http://www.ncbi.nlm.nih.gov/geo/
## 87                                                 http://www.ncbi.nlm.nih.gov/geo/
## 88                                                 http://www.ncbi.nlm.nih.gov/geo/
## 89                                                                 https://clue.io/
## 90                                                                 https://clue.io/
## 91                                                                 https://clue.io/
## 92                                                                 https://clue.io/
## 93                                      http://www.reactome.org/download/index.html
## 94                                                http://www.kegg.jp/kegg/download/
## 95                          http://www.wikipathways.org/index.php/Download_Pathways
## 96                                                                                 
## 97                                                 http://www.ncbi.nlm.nih.gov/geo/
## 98                                                 http://www.ncbi.nlm.nih.gov/geo/
## 99                               http://cgap.nci.nih.gov/Pathways/BioCarta_Pathways
## 100                                                            http://humancyc.org/
## 101                                                         http://pid.nci.nih.gov/
## 102                                               http://www.pantherdb.org/pathway/
## 103                                           https://ntp.niehs.nih.gov/drugmatrix/
## 104                                               http://amp.pharm.mssm.edu/Enrichr
## 105                                                    http://proteincomplexes.org/
## 106                                                   http://tissues.jensenlab.org/
## 107                                                http://www.ncbi.nlm.nih.gov/geo/
## 108                                                 http://www.informatics.jax.org/
## 109                                              http://compartments.jensenlab.org/
## 110                                                  http://diseases.jensenlab.org/
## 111                                                 http://bioplex.hms.harvard.edu/
## 112                                                    http://www.geneontology.org/
## 113                                                    http://www.geneontology.org/
## 114                                                    http://www.geneontology.org/
## 115                                                    http://www.geneontology.org/
## 116                                                    http://www.geneontology.org/
## 117                                                    http://www.geneontology.org/
## 118                                                http://amp.pharm.mssm.edu/archs4
## 119                                                http://amp.pharm.mssm.edu/archs4
## 120                                                http://amp.pharm.mssm.edu/archs4
## 121                                                http://amp.pharm.mssm.edu/archs4
## 122                                                http://amp.pharm.mssm.edu/archs4
## 123                                                     http://sys-myo.rhcloud.com/
## 124                                              http://mirtarbase.mbc.nctu.edu.tw/
## 125                                                      http://www.targetscan.org/
## 126                                               http://amp.pharm.mssm.edu/Enrichr
## 127                                               http://amp.pharm.mssm.edu/Enrichr
## 128                                               http://amp.pharm.mssm.edu/Enrichr
## 129                                   http://tanlab.ucdenver.edu/DSigDB/DSigDBv1.0/
## 130                                                    http://www.geneontology.org/
## 131                                                    http://www.geneontology.org/
## 132                                                    http://www.geneontology.org/
## 133                                                http://www.ncbi.nlm.nih.gov/geo/
## 134                                   http://hgdownload.cse.ucsc.edu/downloads.html
## 135                                            https://www.ncbi.nlm.nih.gov/pubmed/
## 136                                            https://www.ncbi.nlm.nih.gov/pubmed/
## 137                                            https://amp.pharm.mssm.edu/geneshot/
## 138                                 https://www.ncbi.nlm.nih.gov/gene/about-generif
## 139                                            https://www.ncbi.nlm.nih.gov/pubmed/
## 140                                            https://www.ncbi.nlm.nih.gov/pubmed/
## 141                                 https://www.ncbi.nlm.nih.gov/gene/about-generif
## 142                                            https://amp.pharm.mssm.edu/geneshot/
## 143                                                  http://www.subcellbarcode.org/
## 144                                                      https://www.ebi.ac.uk/gwas
## 145                                                   https://www.wikipathways.org/
## 146                                                   https://www.wikipathways.org/
## 147                                                https://www.grnpedia.org/trrust/
## 148                                                            https://www.kegg.jp/
## 149                                                            https://www.kegg.jp/
## 150                                                 https://www.ebi.ac.uk/interpro/
## 151                                                          https://pfam.xfam.org/
## 152                                                             https://depmap.org/
## 153                                                             https://depmap.org/
## 154                                                 http://www.informatics.jax.org/
## 155                                           https://www.ukbiobank.ac.uk/tag/gwas/
## 156                                               https://tripod.nih.gov/bioplanet/
## 157                                           https://www.ncbi.nlm.nih.gov/clinvar/
## 158                                                    http://pheweb.sph.umich.edu/
## 159                                                        https://www.disgenet.org
## 160                                        http://lincs.hms.harvard.edu/kinomescan/
## 161                                         https://portals.broadinstitute.org/ccle
## 162                                                   https://www.proteomicsdb.org/
## 163                                              https://amp.pharm.mssm.edu/lnchub/
## 164                                                            http://phipster.org/
## 165                                       http://www.transgene.ru/disease-pathways/
## 166                                                                                
## 167                                              https://amp.pharm.mssm.edu/covid19
## 168                        https://www.gsea-msigdb.org/gsea/msigdb/collections.jsp 
## 169                                                 https://maayanlab.cloud/Enrichr
## 170                                           https://toxico.nibiohn.go.jp/english/
## 171                                                   https://portal.brain-map.org/
## 172 https://descartes.brotmanbaty.org/bbi/human-gene-expression-during-development/
## 173                                                            https://www.kegg.jp/
## 174                                                   https://www.wikipathways.org/
## 175                           https://hubmapconsortium.github.io/ccf-asct-reporter/
## 176                                                    http://www.geneontology.org/
## 177                                                    http://www.geneontology.org/
## 178                                                    http://www.geneontology.org/
## 179                                                 http://www.informatics.jax.org/
## 180                                           http://biocc.hrbmu.edu.cn/CellMarker/
## 181                                                       http://www.orphadata.org/
## 182                                                https://maayanlab.cloud/covid19/
## 183                                                           https://panglaodb.se/
## 184                                           https://azimuth.hubmapconsortium.org/
## 185                                        https://www.ncbi.nlm.nih.gov/gap/phegeni
## 186                                                 https://maayanlab.cloud/archs4/
## 187                                                 https://maayanlab.cloud/archs4/
## 188                                                 https://maayanlab.cloud/archs4/
## 189                                                 https://maayanlab.cloud/archs4/
## 190                                                         https://gtexportal.org/
## 191                                                         https://www.hdinhd.org/
## 192                                                         https://www.hdinhd.org/
## 193                           https://hubmapconsortium.github.io/ccf-asct-reporter/
## 194                                                  https://fantom.gsc.riken.jp/6/
## 195                      https://github.com/nybell/drugsets/tree/main/DATA/GENESETS
## 196                                                 https://pfocr.wikipathways.org/
## 197                                  https://tabula-sapiens-portal.ds.czbiohub.org/
## 198                                                  https://maayanlab.cloud/chea3/
## 199               https://appyters.maayanlab.cloud/#/Gene_Expression_T2D_Signatures
## 200                                 https://maayanlab.cloud/sigcom-lincs/#/Download
## 201                                 https://maayanlab.cloud/sigcom-lincs/#/Download
## 202                                           https://tabula-muris.ds.czbiohub.org/
## 203                                              https://reactome.org/download-data
## 204                                                    https://www.syngoportal.org/
## 205                                                         https://www.glygen.org/
## 206                                                        https://drugcentral.org/
## 207                                                 https://www.mousephenotype.org/
## 208                                          https://www.metabolomicsworkbench.org/
## 209                              https://www.nature.com/articles/s41587-022-01539-0
## 210                                     https://kinase-library.phosphosite.org/site
## 211                                                    https://gtexportal.org/home/
## 212                                                    http://www.geneontology.org/
## 213                                                    http://www.geneontology.org/
## 214                                                    http://www.geneontology.org/
## 215                                                 https://pfocr.wikipathways.org/
## 216                                                      https://www.ebi.ac.uk/gwas
## 217                                                http://gedipnet.bicnirrh.res.in/
## 218                                         https://magnet-winterlab.herokuapp.com/
##     numTerms                                  appyter categoryId
## 1        615 ea115789fcbf12797fd692cec6df0ab4dbc79c6a          1
## 2        326 7d42eb43a64a4e3b20d721fc7148f685b53b6b30          1
## 3        290 849f222220618e2599d925b6b51868cf1dab3763          1
## 4        353 7ebe772afb55b63b41b79dd8d06ea0fdd9fa2630          7
## 5        701 ad270a6876534b7cb063e004289dcd4d3164f342          7
## 6        498 497787ebc418d308045efb63b8586f10c526af51          7
## 7        249 4a293326037a5229aedb1ad7b2867283573d8bcd          7
## 8         78 b343994a1b68483b0122b08650201c9b313d5c66          7
## 9        199 5c307674c8b97e098f8399c92f451c0ff21cbf68          7
## 10       142 248c4ed8ea28352795190214713c86a39fd7afab          7
## 11       200 eb26f55d3904cb0ea471998b6a932a9bf65d8e50          7
## 12       269                                                   1
## 13       222 f4029bf6a62c91ab29401348e51df23b8c44c90f          7
## 14       385 69c0cfe07d86f230a7ef01b365abcc7f6e52f138          2
## 15      1136 f531ac2b6acdf7587a54b79b465a5f4aab8f00f9          7
## 16      2139 6d655e0aa3408a7accb3311fbda9b108681a8486          4
## 17       386 8dab0f96078977223646ff63eb6187e0813f1433          7
## 18        84 0741451470203d7c40a06274442f25f74b345c9c          5
## 19        96 31191bfadded5f96983f93b2a113cf2110ff5ddb          5
## 20       641 e1d004d5797cbd2363ef54b1c3b361adb68795c6          7
## 21      5192 bf120b6e11242b1a64c80910d8e89f87e618e235          7
## 22      1779 17a138b0b70aa0e143fe63c14f82afb70bc3ed0a          3
## 23       383 e1bc8a398e9b21f9675fb11bef18087eda21b1bf          1
## 24       474 462045609440fa1e628a75716b81a1baa5bd9145          7
## 25      1796 7d3566b12ebc23dd23d9ca9bb97650f826377b16          2
## 26      1658 d047f6ead7831b00566d5da7a3b027ed9196e104          2
## 27        84 54dcd9438b33301deb219866e162b0f9da7e63a0          2
## 28        71 c3bfc90796cfca8f60cba830642a728e23a53565          7
## 29       476 0b09a9a1aa0af4fc7ea22d34a9ae644d45864bd6          7
## 30      6100 9041f90cccbc18479138330228b336265e09021c          7
## 31      6100 ebc0d905b3b3142f936d400c5f2a4ff926c81c37          7
## 32        90 cb2b92578a91e023d0498a334923ee84add34eca          4
## 33       187 27eca242904d8e12a38cf8881395bc50d57a03e1          4
## 34        85 5abad1fc36216222b0420cadcd9be805a0dda63e          4
## 35       858 e4cdcc7e259788fdf9b25586cce3403255637064          4
## 36       189 c76f5319c33c4833c71db86a30d7e33cd63ff8cf          4
## 37       142 aabdf7017ae55ae75a004270924bcd336653b986          7
## 38       323 45268b7fc680d05dd9a29743c2f2b2840a7620bf          4
## 39       323 5f531580ccd168ee4acc18b02c6bdf8200e19d08          4
## 40       967 eb38dbc3fb20adafa9d6f9f0b0e36f378e75284f          5
## 41        93 75c81676d8d6d99d262c9660edc024b78cfb07c9          5
## 42       207                                                   7
## 43        30 49351dc989f9e6ca97c55f8aca7778aa3bfb84b9          5
## 44      3906 1905132115d22e4119bce543bdacaab074edb363          6
## 45       311 e2b4912cfb799b70d87977808c54501544e4cdc9          6
## 46       941 5216d1ade194ffa5a6c00f105e2b1899f64f45fe          7
## 47       205 fd1332a42395e0bc1dba82868b39be7983a48cc5          7
## 48       402 7e3e99e5aae02437f80b0697b197113ce3209ab0          7
## 49      2192 3804715a63a308570e47aa1a7877f01147ca6202          5
## 50       816 56b6adb4dc8a2f540357ef992d6cd93dfa2907e5          1
## 51       412 55b56cd8cf2ff04b26a09b9f92904008b82f3a6f          1
## 52        59 d40701e21092b999f4720d1d2b644dd0257b6259          2
## 53      2192 ea67371adec290599ddf484ced2658cfae259304          5
## 54       109 c209ae527bc8e98e4ccd27a668d36cd2c80b35b4          7
## 55       216 98366496a75f163164106e72439fb2bf2f77de4e          4
## 56       216 83a710c1ff67fd6b8af0d80fa6148c40dbd9bc64          4
## 57       476 a4c6e217a81a4a58ff5a1c9fc102b70beab298e9          7
## 58       239 70e4eb538daa7688691acfe5d9c3c19022be832b          7
## 59       125 711f0c02b23f5e02a01207174943cfeee9d3ea9c          7
## 60       179 e80d25c56de53c704791ddfdc6ab5eec28ae7243          7
## 61       209 47edfc012bcbb368a10b717d8dca103f7814b5a4          7
## 62       104 ab824aeeff0712bab61f372e43aebb870d1677a9          7
## 63       404 1f7eea2f339f37856522c1f1c70ec74c7b25325f          7
## 64      1389 36e541bee015eddb8d53827579549e30fe7a3286          7
## 65       315 a7acc741440264717ff77751a7e5fed723307835          5
## 66        12 663b665b75a804ef98add689f838b68e612f0d2a          6
## 67       839 0f412e0802d76efa0374504c2c9f5e0624ff7f09          8
## 68       839 9ddc3902fb01fb9eaf1a2a7c2ff3acacbb48d37e          8
## 69       906 068623a05ecef3e4a5e0b4f8db64bb8faa3c897f          8
## 70     32876 76fc5ec6735130e287e62bae6770a3c5ee068645          6
## 71       906 c9c2155b5ac81ac496854fa61ba566dcae06cc80          8
## 72       428 18a081774e6e0aaf60b1a4be7fd20afcf9e08399          2
## 73      2460 53dedc29ce3100930d68e506f941ef59de05dc6b          8
## 74      2460 499882af09c62dd6da545c15cb51c1dc5e234f78          8
## 75       395 712eb7b6edab04658df153605ec6079fa89fb5c7          7
## 76       345 010f1267055b1a1cb036e560ea525911c007a666          4
## 77     33132 5e678b3debe8d8ea95187d0cd35c914017af5eb3          7
## 78     33132 fedbf5e221f45ee60ebd944f92569b5eda7f2330          7
## 79      2918 74b818bd299a9c42c1750ffe43616aa9f7929f02          5
## 80      2918 103738763d89cae894bec9f145ac28167a90e611          5
## 81       261 1eb3c0426140340527155fd0ef67029db2a72191          8
## 82       286 cd95fe1b505ba6f28cd722cfba50fdea979d3b4c          8
## 83       286 74c4f0a0447777005b2a5c00c9882a56dfc62d7c          8
## 84       261 31baa39da2931ddd5f7aedf2d0bbba77d2ba7b46          8
## 85       401 555f68aef0a29a67b614a0d7e20b6303df9069c6          8
## 86       401 1bc2ba607f1ff0dda44e2a15f32a2c04767da18c          8
## 87       312 9e613dba78ef7e60676b13493a9dc49ccd3c8b3f          8
## 88       312 d0c3e2a68e8c611c669098df2c87b530cec3e132          8
## 89        96 957846cb05ef31fc8514120516b73cc65af7980e          7
## 90        96 3bd494146c98d8189898a947f5ef5710f1b7c4b2          7
## 91      3644 1ccc5bce553e0c2279f8e3f4ddcfbabcf566623b          7
## 92      3644 b54a0d4ba525eac4055c7314ca9d9312adcb220c          7
## 93      1530 1f54638e8f45075fb79489f0e0ef906594cb0678          7
## 94       293 43f56da7540195ba3c94eb6e34c522a699b36da9          7
## 95       437 340be98b444cad50bb974df69018fd598e23e5e1          7
## 96       104 5426f7747965c23ef32cff46fabf906e2cd76bfa          1
## 97       285 bb9682d78b8fc43be842455e076166fcd02cefc3          2
## 98       285 78618915009cac3a0663d6f99d359e39a31b6660          2
## 99       237 13d9ab18921d5314a5b2b366f6142b78ab0ff6aa          2
## 100      152 d6a502ef9b4c789ed5e73ca5a8de372796e5c72a          2
## 101      209 3c1e1f7d1a651d9aaa198e73704030716fc09431          2
## 102      112 ca5f6abf7f75d9baae03396e84d07300bf1fd051          2
## 103     7876 255c3db820d612f34310f22a6985dad50e9fe1fe          4
## 104      645 af271913344aa08e6a755af1d433ef15768d749a          7
## 105      995 249247d2f686d3eb4b9e4eb976c51159fac80a89          2
## 106     1842 e8879ab9534794721614d78fe2883e9e564d7759          3
## 107     1302 f0752e4d7f5198f86446678966b260c530d19d78          8
## 108     5231 0705e59bff98deda6e9cbe00cfcdd871c85e7d04          7
## 109     2283 56ec68c32d4e83edc2ee83bea0e9f6a3829b2279          3
## 110     1811 3045dff8181367c1421627bb8e4c5a32c6d67f98          3
## 111     3915 b8620b1a9d0d271d1a2747d8cfc63589dba39991          2
## 112      636 8fed21d22dfcc3015c05b31d942fdfc851cc8e04          7
## 113      972 b4018906e0a8b4e81a1b1afc51e0a2e7655403eb          7
## 114     3166 d9da4dba4a3eb84d4a28a3835c06dfbbe5811f92          7
## 115      816 ecf39c41fa5bc7deb625a2b5761a708676e9db7c          7
## 116     3271 8d8340361dd36a458f1f0a401f1a3141de1f3200          7
## 117    10125 6404c38bffc2b3732de4e3fbe417b5043009fe34          7
## 118      108 4126374338235650ab158ba2c61cd2e2383b70df          5
## 119      125 5496ef9c9ae9429184d0b9485c23ba468ee522a8          5
## 120      352 ce60be284fdd5a9fc6240a355421a9e12b1ee84a          4
## 121      498 6721c5ed97b7772e4a19fdc3f797110df0164b75          2
## 122     1724 8a468c3ae29fa68724f744cbef018f4f3b61c5ab          1
## 123     1135                                                   8
## 124     3240 6b7c7fe2a97b19aecbfba12d8644af6875ad99c4          1
## 125      683 79d13fb03d2fa6403f9be45c90eeda0f6822e269          1
## 126      121 e9b7d8ee237d0a690bd79d970a23a9fa849901ed          6
## 127     1722 be2ca8ef5a8c8e17d7e7bd290e7cbfe0951396c0          1
## 128       12 17ce5192b9eba7d109b6d228772ea8ab222e01ef          6
## 129     4026 287476538ab98337dbe727b3985a436feb6d192a          4
## 130     5103 b5b77681c46ac58cd050e60bcd4ad5041a9ab0a9          7
## 131      446 e9ebe46188efacbe1056d82987ff1c70218fa7ae          7
## 132     1151 79ff80ae9a69dd00796e52569e41422466fa0bee          7
## 133     1958 34d08a4878c19584aaf180377f2ea96faa6a6eb1          1
## 134       36 fdab39c467ba6b0fb0288df1176d7dfddd7196d5          6
## 135     5687 859b100fac3ca774ad84450b1fbb65a78fcc6b12          6
## 136    12558 fc5bf033b932cf173633e783fc8c6228114211f8          6
## 137     3725 375ff8cdd64275a916fa24707a67968a910329bb          4
## 138     2244 0f7fb7f347534779ecc6c87498e96b5460a8d652          4
## 139    12558 f77de51aaf0979dd6f56381cf67ba399b4640d28          6
## 140     5684 25fa899b715cd6a9137f6656499f89cd25144029          6
## 141     2244 0fb9ac92dbe52024661c088f71a1134f00567a8b          4
## 142     3725 ee3adbac2da389959410260b280e7df1fd3730df          4
## 143      104 b50bb9480d8a77103fb75b331fd9dd927246939a          2
## 144     1737 fef3864bcb5dd9e60cee27357eff30226116c49b          7
## 145      472 b0c9e9ebb9014f14561e896008087725a2db24b7          7
## 146      176 e7750958da20f585c8b6d5bc4451a5a4305514ba          7
## 147      571 5f8cf93e193d2bcefa5a37ccdf0eefac576861b0          1
## 148      308 3477bc578c4ea5d851dcb934fe2a41e9fd789bb4          7
## 149      303 187eb44b2d6fa154ebf628eba1f18537f64e797c          7
## 150     1071 18dd5ec520fdf589a93d6a7911289c205e1ddf22          6
## 151      608 a6325ed264f9ac9e6518796076c46a1d885cca7a          6
## 152      558 0b08b32b20854ac8a738458728a9ea50c2e04800          4
## 153      325 b7c4ead26d0eb64f1697c030d31682b581c8bb56          4
## 154     5261 f1bed632e89ebc054da44236c4815cdce03ef5ee          7
## 155      857 958fb52e6215626673a5acf6e9289a1b84d11b4a          4
## 156     1510 e110851dfc763d30946f2abedcc2cd571ac357a0          2
## 157      182 0a95303f8059bec08836ecfe02ce3da951150547          4
## 158     1161 6a7c7321b6b72c5285b722f7902d26a2611117cb          4
## 159     9828 3c261626478ce9e6bf2c7f0a8014c5e901d43dc0          4
## 160      148 47ba06cdc92469ac79400fc57acd84ba343ba616          2
## 161      378 7094b097ae2301a1d6a5bd856a193b084cca993d          5
## 162      913 8c87c8346167bac2ba68195a32458aba9b1acfd1          5
## 163     3729 45b597d7efa5693b7e4172b09c0ed2dda3305582          1
## 164     6715 a592eed13e8e9496aedbab63003b965574e46a65          2
## 165     1721 9196c760e3bcae9c9de1e3f87ad81f96bde24325          2
## 166      802 ad580f3864fa8ff69eaca11f6d2e7f9b86378d08          6
## 167      205 72b0346849570f66a77a6856722601e711596cb4          7
## 168       50 6952efda94663d4bd8db09bf6eeb4e67d21ef58c          2
## 169     1482 8dc362703b38b30ac3b68b6401a9b20a58e7d3ef          6
## 170     1190 9e32560437b11b4628b00ccf3d584360f7f7daee          4
## 171      766 46f8235cb585829331799a71aec3f7c082170219          5
## 172      172                                                   5
## 173      320                                                   2
## 174      622                                                   2
## 175      344                                                   5
## 176     6036                                                   7
## 177      511                                                   7
## 178     1274                                                   7
## 179     4601                                                   3
## 180     1097                                                   5
## 181     3774                                                   4
## 182      478                                                   4
## 183      178                                                   5
## 184      341                                                   5
## 185      950                                                   4
## 186     4269                                                   8
## 187     4269                                                   8
## 188     4216                                                   8
## 189     4216                                                   8
## 190      270                                                   4
## 191     2564                                                   4
## 192     2579                                                   4
## 193      777                                                   5
## 194      350                                                   1
## 195     1395                                                   4
## 196    17326                                                   7
## 197      469                                                   5
## 198      757                                                   1
## 199      601                                                   4
## 200    10850                                                   4
## 201    10424                                                   4
## 202      106                                                   5
## 203     1818                                                   2
## 204      118                                                   3
## 205      338                                                   2
## 206      888                                                   4
## 207      529                                                   3
## 208      233                                                   2
## 209     1748                                                   4
## 210      303                                                   2
## 211      511                                                   5
## 212     5407                                                   3
## 213      474                                                   3
## 214     1147                                                   3
## 215    21845                                                   2
## 216     5271                                                   4
## 217     2388                                                   4
## 218       72                                                   5
dbs_pathway <- c("BioPlanet_2019", "WikiPathway_2021_Human", "KEGG_2021_Human")
if (websiteLive) {
    enriched_pathway_brain <- enrichr(brain_total, dbs_pathway)
}
## Uploading data to Enrichr... Done.
##   Querying BioPlanet_2019... Done.
##   Querying WikiPathway_2021_Human... Done.
##   Querying KEGG_2021_Human... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of BioPlanet 2019 database", enriched_pathway_brain[[1]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of WikiPathway 2021 Human database", enriched_pathway_brain[[2]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of KEGG 2021 Human database", enriched_pathway_brain[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

Liver

if (websiteLive) {
    enriched_pathway_liver <- enrichr(liver_total, dbs_pathway)
}
## Uploading data to Enrichr... Done.
##   Querying BioPlanet_2019... Done.
##   Querying WikiPathway_2021_Human... Done.
##   Querying KEGG_2021_Human... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of BioPlanet 2023 database", enriched_pathway_liver[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of WikiPathway 2023 Human database", enriched_pathway_liver[[2]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of KEGG 2023 Human database", enriched_pathway_liver[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

Lung

if (websiteLive) {
    enriched_pathway_lung <- enrichr(lung_total, dbs_pathway)
}
## Uploading data to Enrichr... Done.
##   Querying BioPlanet_2019... Done.
##   Querying WikiPathway_2021_Human... Done.
##   Querying KEGG_2021_Human... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Enriched terms of BioPlanet 2023 database", enriched_pathway_lung[[1]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of WikiPathway 2023 Human database", enriched_pathway_lung[[2]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

if (websiteLive) plotEnrich(title = "Enriched terms of KEGG 2023 Human database", enriched_pathway_lung[[3]], showTerms = 5, numChar = 100, y = "Count", orderBy = "P.value")

Further analysis with Human Gene Atlas

dbs_celltypes <- c("Human_Gene_Atlas")
if (websiteLive) {
    enriched_celltypes_brain <- enrichr(brain_total, dbs_celltypes)
}
## Uploading data to Enrichr... Done.
##   Querying Human_Gene_Atlas... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Brain - Human Gene Atlas database", enriched_celltypes_brain[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")

dbs_celltypes <- c("Human_Gene_Atlas")
if (websiteLive) {
    enriched_celltypes_liver <- enrichr(liver_total, dbs_celltypes)
}
## Uploading data to Enrichr... Done.
##   Querying Human_Gene_Atlas... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Liver - Human Gene Atlas database", enriched_celltypes_liver[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")

dbs_celltypes <- c("Human_Gene_Atlas")
if (websiteLive) {
    enriched_celltypes_lung <- enrichr(lung_total, dbs_celltypes)
}
## Uploading data to Enrichr... Done.
##   Querying Human_Gene_Atlas... Done.
## Parsing results... Done.
if (websiteLive) plotEnrich(title = "Lung - Human Gene Atlas database", enriched_celltypes_lung[[1]], showTerms = 5, numChar = 40, y = "Count", orderBy = "P.value")

Conclusions

At the end I can conclude that each tissue is correctly identified by the up regulated genes. In addition this methodology is robust enough to identify differentially expressed genes and tissue even in raw data, without deleting pseudogenes, non canonical chromosome, rRNA genes…